Overview of the Mixed Script Information Retrieval (MSIR) at FIRE-2016
نویسندگان
چکیده
The shared task on Mixed Script Information Retrieval (MSIR) was organized for the fourth year in FIRE-2016. The track had two subtasks. Subtask-1 was on question classification where questions were in code mixed Bengali-English and Bengali was written in transliterated Roman script. Subtask-2 was on ad-hoc retrieval of Hindi film song lyrics, movie reviews and astrology documents, where both the queries and documents were in Hindi either written in Devanagari script or in Roman transliterated form. A total of 33 runs were submitted by 9 participating teams, of which 20 runs were for subtask-1 by 7 teams and 13 runs for subtask-2 by 7 teams. The overview presents a comprehensive report of the subtasks, datasets and performances of the submitted runs.
منابع مشابه
NLP-NITMZ @ MSIR 2016 System for Code-Mixed Cross-Script Question Classification
This paper describes our approach on Code–Mixed Cross– Script Question Classification task, which is a subtask 1 of MSIR 2016. MSIR is a Mixed Script Information Retrieval event in conjunction with FIRE 2016, which is the 8th meeting of Forum for Information Retrieval Evaluation. For this task, our team NLP–NITMZ submitted three system runs such as: i) using a direct feature set; ii) using dire...
متن کاملDA-IICT in FIRE 2015 Shared Task on Mixed Script Information Retrieval
This paper aims to describe the methodology followed by Team Watchdogs in their submission for the shared task on Mixed Script Information Retrieval (MSIR) in FIRE 2015. I participated in the subtask 1 (Query Word Labelling) and 2 (Mixed-script Ad hoc retrieval). For subtask 1, Machine Learning approach using CRF classifier was used to classify the tokens as one of the possible languages using ...
متن کاملAmrita-CEN@MSIR-FIRE2016: Code-Mixed Question Classification using BoWs and RNN Embeddings
Question classification is a key task in many question answering applications. Nearly all previous work on question classification has used machine learning and knowledge-based methods. This working note presents an embedding based Bag-ofWords method and Recurrent Neural Network to achieve an automatic question classification in the code-mixed BengaliEnglish text. We build two systems that clas...
متن کاملISM@FIRE-2015: Mixed Script Information Retrieval
This paper describes the approach we have used for identification of languages for a set of terms written in Roman script and approaches for the retrieval in mixed script domain, in FIRE-2015. The first approach identifies the class (native language of terms and whether a term is any named entity or of any other type) of given terms/words. MaxEnt a supervised classifier has been used for the cl...
متن کاملExploiting Named Entity Mentions Towards Code Mixed IR : Working Notes for the UB system submission for MSIR@FIRE-2016
A sizable percentage of online user generated content is susceptible to code switching and code mixing owing to a variety of reasons. Thus, an expected consequence is that adhoc user queries on such data are also inherently code mixed. This paper thus presents our solution for a similar scenario : information retrieval on code mixed Hindi-English tweets. We explore techniques in information ext...
متن کامل